A simple hierarchical approach to modeling distributions of substitution rates.

نویسندگان

  • Sergei L Kosakovsky Pond
  • Simon D W Frost
چکیده

Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

مدل یابی انتشار بیماری های عفونی بر اساس رویکرد آماری بیز

Background and Aim: Health surveillance systems are now paying more attention to infectious diseases, largely because of emerging and re-emerging infections. The main objective of this research is presenting a statistical method for modeling infectious disease incidence based on the Bayesian approach.Material and Methods: Since infectious diseases have two phases, namely epidemic and non-epidem...

متن کامل

A Study on Effective Factors on New Product Development with an Emphasis on Fuzzy Hierarchical Analysis Approach

Nowadays the new product and its importance are consideredas an essential strategy for staying in business. Though the hi-techindustry has focused on value innovation and improving the qualityof the new product development (NPD) process to drive new productperformance, new product success has not changed dramaticallyover the years. This study presents a novel approach based on structuralequatio...

متن کامل

Impact of Proximate Determinants on Fertility Transition Behind the Socio-demographic Factors in Bangladesh: A Hierarchical Approach from the National Survey

Introduction: Fertility is a vital ingredient in measuring population fluctuation. Bangladesh is still above the level of transplantation of fertility. The target of this research was to determine the proximate factors on fertility rate reduction in Bangladesh. Methods: The 2014 Bangladesh Demographic and Health Survey (BDHS) was used as secondary data. T...

متن کامل

Spatial count models on the number of unhealthy days in Tehran

Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 22 2  شماره 

صفحات  -

تاریخ انتشار 2005